3. Conditional Probability Tables¶

This Notebook shows how to create the CPTs for the Student example from Koller & Friedman.

In [1]:

%run '_preamble.ipynb'

Python version: 3.8.10
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

available imports:
  import os
  import logging
  import pandas as pd
  import numpy as np

connect to this kernel with:
  jupyter console --existing 9fa5c31c-e49b-417e-882b-5f4ace153127

Could not create logging directory "../logs"
Logging to: "../logs/notebook.log"
Current date/time: 11-06-2021, 21:27
Current working directory: "/Users/melle/software-development/thomas-master/notebooks"

In [2]:

from thomas.core.factors import CPT

from IPython.display import display, HTML

In [3]:

def subset(full_dict, keys):
    """Return a subset of a dict."""
    return {k: full_dict[k] for k in keys}

In [4]:

# We're defining CPTs for multiple random variables. The dictionary
# `states` keeps track the states each variable can take on.
states = {
    'I': ['i0', 'i1'],
    'S': ['s0', 's1'],
    'D': ['d0', 'd1'],
    'G': ['g1', 'g2','g3'],
    'L': ['l0', 'l1'],
}

# We'll store the CPTs in a dict, indexed by the name of the 
# conditioned variable.
P = dict()

# Create the CPT (which isn't really conditional probabilities, but rather prior
# probabilities) for random variable I.
P['I'] = CPT(
    [0.7, 0.3], 
    states=subset(states, ['I']),
    description='Intelligence'
)

In [5]:

# Display the CPT for random variable 'I': intelligence. The variable's states
# are listed as columns.
P['I']

Out[5]:

P(I) Intelligence

I	i0	i1
	0.7	0.3

In [6]:

# Create the CPT for random variable 'S'. The probabilities for S are conditional
# on I. In other words, the CPT defines S given I which can be written as 
# P(S|I).
P['S'] = CPT(
    [0.95, 0.05, 
     0.20, 0.80], 
    states=subset(states, ['I', 'S']),
    description='SAT Score'
)

In [7]:

# Display the CPT for random variable 'S': SAT Score. Again, the variable's 
# states are listed as columns. The conditioning variables' states are listed
# as rows.
P['S']

Out[7]:

P(S|I) SAT Score

S	s0	s1
I
i0	0.95	0.05
i1	0.20	0.80

In [8]:

# Internally, P['S'] is essentially a multi-level factor
print(P['S'])

P(S|I)
I   S 
i0  s0    0.95
    s1    0.05
i1  s0    0.20
    s1    0.80
dtype: float64

In [9]:

# Create the remained of the CPTs
P['D'] = CPT(
    [0.6, 0.4], 
    states=subset(states, ['D']),
    description='Difficulty'
)

P['G'] = CPT(
    [0.30, 0.40, 0.30, 
     0.05, 0.25, 0.70, 
     0.90, 0.08, 0.02, 
     0.50, 0.30, 0.20],
    states=subset(states, ['I', 'D', 'G']),
    description='Grade'
)

P['L'] = CPT(
    [0.10, 0.90,
     0.40, 0.60,
     0.99, 0.01],
    states=subset(states, ['G', 'L']),
    description='Letter'
)

In [10]:

# There can, of course, be more than one conditioning variable
P['G']

Out[10]:

P(G|I,D) Grade

	G	g1	g2	g3
I	D
i0	d0	0.30	0.40	0.30
i0	d1	0.05	0.25	0.70
i1	d0	0.90	0.08	0.02
i1	d1	0.50	0.30	0.20

In [11]:

# The CPT can be accessed through the __getitem__ accessor:
P['I']['i0']

Out[11]:

0.7

In [12]:

# The same goes for multi-level CPTs
P['S'].as_factor()

Out[12]:

factor(I,S)
I   S 
i0  s0    0.95
    s1    0.05
i1  s0    0.20
    s1    0.80
dtype: float64